## Is it Memory or Logic? Blurring the Gap

### Prof. Vijaykrishnan Narayanan The Pennsylvania State University

In Collaboration with

Professors Suman Datta, Sayeef Salahuddin, Sumeet Gupta, Sharon Hu, Michael Niemier, (Marvin) Mei-Fan Chang

Supported in part by NSF expeditions-in-computing, DARPA/SRC LEAST Center, NSF ASSIST ERC

September 2017





Why can't you get smarter as well



Offloading computations to memory helps in reducing data movement from farther memory to the processor

# Overview

- Monolithic 3D Integration
- •Configurable Memory-Logic Device
- Cross point arrays

#### **3D Integration: Technology and Evolution**



#### **Compute-Oriented Caches**



#### Concurrent row and column accessible 3D memory



2.15x access time savings

This memory design can cater to applications requiring multi-dimensional data access for enhancing system performance.

Transforms traditional compiler optimizations

#### LUT based in-memory computing.



Boolean and arithmetic computations in memory along with store back feature

#### Code Compilation for in-memory compute



39% of the executions were done in memory

Where to do the compute – memory or logic ?

#### Rapid reconfigurable monolithic 3D FPGA (what FPGA)



# Overview

- Monolithic 3D Integration
- Configurable Memory-Logic Device
- Cross point arrays

### New Pathway: Integration of NVM and Processor

#### Power Source Driven -- Frequent backup and restores (B&R)

- e.g. Energy-harvesting computing systems  $\rightarrow$  intermittent power supply
- e.g. Low stand-by power systems  $\rightarrow$  complete fine-grain power-gating
- Key: reducing energy overhead!







#### **Event Driven -- Quick response in normally-off Applications**

- Key: low-latency restore





#### **NCFET Modeling and Evaluation** From Physics to Devices

**Physics-based:** Employing time dependent LK equations solved self-consistently with MOSFET equations **Circuit compatible:** SPICE based model enables seamless integration of the model with circuit simulators Enables efficient device-circuit co-design and analysis from materials through circuits.



#### **Focus: Memory-Logic Integration**

Theory T<sub>FE</sub>=8nm 10<sup>-4</sup> Current(A) 10<sup>-6</sup> 10<sup>-8</sup> **10<sup>-10</sup>** -0.5 0.5 0 V<sub>GS</sub>(V) **Experiments** 10<sup>-2</sup>  $10^{-4}$ 10<sup>-6</sup> |V\_0|=50mV, **e** 10<sup>-8</sup> 0.5V. 0.9V 10<sup>-10</sup> 10<sup>-12</sup> -2 V<sub>G</sub> (V)



#### New Opportunities: 1. Memory-Logic Integration 2. NVM: scalable, low energy

#### **Experimental Results from Prof. Salahuddin**

Xueqing Li et al, IEEE TED, vol. 64, no. 8, Aug. 2017; Patent filed;

CENTER FOR LOW ENERGY SYSTEMS TECHNOLOGY



# **Compact Model**

#### **Key Features**

- Physics-based: Employing time dependent LK equations solved self-consistently with MOSFET equations
- Circuit compatible: SPICE based model enables seamless integration of the model with testbenches. Compatible with existing commercial circuit simulation tools
- Enables efficient device-circuit co-design and analysis from materials through circuits.



# **Compact Model: Initialization**

Example simulation  $(V_{GS} \text{ sweep } @ V_{DS} = 0.7V)$ 



- Initialization is required to correctly capture the effect of gate and drain voltages on FE polarization.
- Device simulations are started with all voltages (V<sub>GS</sub>, V<sub>DS</sub>) at 0. The voltages are then ramped to desired values to capture the polarization trajectory.
- Circuit simulation are started by ramping supply rail from 0 to V<sub>DD</sub> and then applying the inputs voltages.



**Model Calibration** 

$$E - \rho \frac{dP}{dt} = \alpha P + \beta P^3 + \gamma P^5$$





 $\alpha = -1.05 \times 10^9 \text{ m/F}$  $\beta = 1 \times 10^7 \text{ m}^5/\text{F/C}^2$  $\gamma = 6 \times 10^{11} \text{ m}^9/\text{F/C}^4$ 

Static coefficients extracted from calibration against experiments

Behavior of the model with respect to varying input frequencies consistent with other works (Kobayashi et al, VLSI Tech 2015)

Aziz et al, EDL 2016

#### **Compact Model: Capturing Different Modes of Operation**



By varying the parameters (LK parameters, FE thickness (T<sub>FE</sub>), oxide metrics of the underlying transistor), steep switching, non-volatile and anti-ferroelectric behavior can be obtained

#### **FeFET Logic Design Benchmarking**



FeFET Id-Vgs comparison with MOSFET



For an inverter in seven-stage RO  $C_W = 0, T_{FE} = 5 \text{ nm}$ 

**STAR**net

**CENTER FOR LOW ENERGY SYSTEMS TECHNOLOGY** 

S. Gupta et al, TED August 17 21

#### **nvDFF: Concept and Significance**



- Low backup and restore (B&R) energy and latency;
- On-demand or automatic B&R
- Other concerns: area, control complexity, retention time, circuit interface, process compatibility, etc.



#### **Restore Operation**



#### **nvDFF1: Edge Computing with FeFET – Benchmarking**

#### Normal operation overhead

#### **Performance benchmarking**



|                             | [10]     | [9]        | [11]                       | This Work                  |        |        |
|-----------------------------|----------|------------|----------------------------|----------------------------|--------|--------|
|                             | Measured | Simulated  | Simulated <sup>&amp;</sup> | Simulated*                 |        |        |
| Tech. size                  | 130nm    | 70nm       | 180nm                      | 10nm                       |        |        |
| Voltage                     | 1.5V     | 1.0V       | 1.8V                       | 0.3V-0.8V                  |        |        |
| Material                    | P7T Can  | MTI        | ReRAM                      | 6nm HfO <sub>2</sub> , PZT |        |        |
| Waterial                    |          | 11115      | Keikhivi                   | ρ=0.04                     | ρ=0.10 | ρ=0.25 |
| $T_{Backup+Restore}$        | 2.67µS   | >10uS      | 1.3µS                      | 277pS                      | 583pS  | 1.29nS |
| E <sub>Backup+Restore</sub> | 2.40pJ   | 382fJ      | 735fJ                      | 1.38fJ                     |        |        |
| Break-Even Time             | /        | 0.83µS@25℃ | 1.47mS                     | 55.9nS                     |        |        |

\*: The results are for the topology of NVFF-I in [11] operating at 0.8 V supply (rise to 2.4V for ReRAM write) for the shortest break-even time.

\*: Backup and restore performance in this table is simulated at 0.5V supply.

# ✓ Ultra-Low E<sub>B&R</sub> ✓ Ultra-Low latency

✓ Low normal-operation overhead

CENTER FOR LOW ENERGY SYSTEMS TECHNOLOGY

Xueqing Li et al, IEEE TED, Aug. 2017; Patent filed; 24



#### nvDFF2: Intrinsically Nonvolatile DFF



- **1.** Backup/restore features
  - ✓ Fast: done in sub-nS (1000x);
  - ✓ Low-energy: < 2.4fJ@0.8V (1000x);</p>
  - ✓ Autonomous
    - no ext. control needed Compared with
- 2. Normal operation features FeCAP solution
  - ✓ Fast: GHz operation;
  - EDP overhead: ~35% (x4 fan-out)
- 3. Dense (only 2 added transistors), low-voltage, scalable, CMOS compatible



VDD

Xueqing Li et al, IEEE Trans. CAS-I, vol.PP, no.99, pp.1-13; Patent filed;

#### **nvSRAM: Enabling Nonvolatile Computing with FeFET NVM**



### nvSRAM: Enabling Nonvolatile Computing with FeFET NVM



**NO-DC-Current operation**  $\rightarrow \sim 600x E_{B\&R}$  and Break-even time savings! Enabling finer-grain power-gating with significantly lowered BET!

Break-even time (BET) is the maximum standby time of a volatile SRAM if an equal amount  $E_{B\&R}$  is provided to sustain leakage.

Xueqing Li et al, IEEE TED, July 2017; Patent filed



LOW ENERGY SYSTEMS TECHNOLOGY

#### **Symmetric FeFET NVM:** Flexible Data Analytics



#### **Recent Progress: Experimental FeFET NVM Circuit**





- 10 nm HZO/0.8 nm SiO<sub>2</sub>/p-Si gate stack
- 40  $\mu$ m / 2  $\mu$ m with 2  $\mu$ m gate overlap with S/D



29

**STAR**net



Ongoing collaboration work with Prof. Datta

# Overview

- Monolithic 3D Integration
- •Configurable Memory-Logic Device
- •Cross point arrays

### **Cross-Point Peripheral Reconfigurability: Circuit-Architecture co-design**

Each X-point array Reconfigurable as:



Multiple Arrays To Form Programmable Unified System



# Study of selector for X-point: cross-point memories require selectors to eliminate current sneak path



# Summary

- New generation of design automation tools that drive the innovation in devices
- New computational models will create additional search dimensions for exploration tools
- Models to support circuit-architecture explorations
- Software design stack and protocols co-designed with device fabrics compound benefits
- Leaves room for several interesting questions quest in many other countries in large funded projects
- Endurance, feature/voltage scaling, additional features: integrated sensing-compute structures, relative progress of other competing memory technologies.